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Abstract 

We present two diagnostic methods based on ideas of Principal Com- 
ponent Analysis and demonstrate their efficiency for sophisticated pro- 
cessing of multicolour photometric observations of variable objects. 

Keywords: variable stars: light curves; methods: Principal Compo- 
nent Analysis, Least Square Method, robust regression. 

1 Introduction 

In the last few decades it has boomed volume and common access to 
high-quality variable stars observational data, however the standard used 
methods of data processing and interpretation have lagged behind this 
progress. One of the step how to overtake this disaccord is the consequent 
application of Principal Component Analysis (PCA) combined with the 
Robust Regression (RR), factor analysis, wavelet analysis and other so- 
phisticated approaches to the treatment of variable stars observations. 

The commonly used method for the treatment of astrophysical data is 
simple (unweighted) Least Square Method (LSM). As these data usually 
suffer from outliers and very different quality, the method yields question- 
able and misleading results. Robust regression as an adequate alternative 
of the standard LSM is used only seldom, LSM weights, if introduced at 
all, are often used unknowledgeable. 



2 Standard and Weighted PCA 

Principal component analysis is one of the oldest and the most elaborated 
method of the treatment of statistical data. PCA can be used to simplify 
a data-set without loss of information. It is a linear transformation that 
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chooses a new coordinate system such that the greatest variance corre- 
sponds to the first axis, the residuals then to second one, etc. PCA is 
simple, straightforward, it does not need any model. It diminishes the 
number of uncorrelated parameters necessary for the description of the 
data-set, helps to reveal hidden relationships and effectively suppresses 
noise. For more details see e.g. [I] or [2J. 

Presently, it is profusely used namely in image techniques, politics, 
criminal science, sociology and other human sciences, however, in astron- 
omy is almost unknown. We shall demonstrate how to apply the standard 
PCA on routine tasks of the variable stars observations processing. 

Let we have p photometric measurements obtained in q photometric 
colours, which we can arrange in the form of p row vectors with q compo- 
nents: {yi,y 2 , • • • , y P }, y* = \yn,Via, ■ ■ ■ Viq], or into the pxq matrix Y. 
Each measurement can be then described as a point in the q-D space, all p 
observations represent the "cloud" of points, whose global characteristics 
we will study by means of the standard PCA. 

If we want to use PCA as effective as possible we shall linearly trans- 
form components of these data vectors into new variables {zi, z 2 , . . . , z p }: 

Zii = (1) 

where yj is the mean value of the jf-th components 0-th colour), ~s] is the 
estimate of the mean (typical) error (uncertainty) of the j-th component. 
The purpose of this transformation is to identify the middle of the data 
cloud of observations with the origin of the new system of coordinates 
and to equalize all coordinates among them. The PCA here implicitly hy- 
pothesizes that at least the ratios among errors of measurements in various 
colours are roughly constant. "Errorboxes" of particular measurements 
in q-D space should have the form of spheres of the unit radius. 

The standard PCA can be easily extended to Weighted Principal 
Component Analysis (WPCA) introducing weights of individual data vec- 
tors. The weight of that is put to be inversely proportional to the square of 
£i\ Wi ~ £~ 2 , where Si is the expected uncertainty of a component of the 
j-th data vector Zj. Let w = [wi, u>2, ■ ■ ■ , w p ] is a vector describing weights 
of individual data vectors, the diagonal matrix of weights W of size pxp 
is defined: W = diag(w). In our g-D representation it corresponds to the 
permission that errorspheres of individual sets of multicolour measure- 
ments may have various effective radii, proportional to gj. The standard 
PCA is then the special case for WPCA with equal weights, W ~ I p . 

The above mentioned PCA linear transformation of a vector z to a 
smoothed vector z s by actuation of the smoothing qxq matrix A, can be 
written as: 

z s = z A = z (A A T ), y s = [z sl sT + yT, . . . z sq s^ + ffo], (2) 

where A is the qxr matrix consisting of r columns of normalized eigen- 
vectors a.i of the symmetric definite qxq matrix Z T WZ, where Z is the 
qxp date matrix: Z = [zi;z 2 ; . . . z 9 ]. As it follows from the definition, 
each eigenvector ai together with the corresponding eigenvalue Xi shall 
obey the relation: 

(Z T WZ)a i = ai X i . (3) 
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It can be proven that for the qxq matrix Z T W Z just q eigenval- 
ues and q normalized eigenvectors exist. All the set of q eigenvectors 
forms an orthonormal vector base. Let we order eigenvectors according to 
their eigenvalues from the largest to the smallest ones into the sequence 
{ai,. . .a q }. Now we take the first r (r < q) eigenvectors and connect 
them into the matrix A = [ai, . . . a r ]. Their eigenvalues then create the 
diagonal of the rxr diagonal matrix A = diag([Ai, . . . A r ]): 

(Z T WZ)A = AA; aL i -a j =8 i] , A T A = I, (4) 

where 8ij is the discrete version of the Kronecker delta function, I is 
the rxr identity matrix. Vectors {ai,a2, . . . ,a r } contained in A repre- 
sent orthonormal vector base of the r-D subspace plunged into the q-T) 
space. The arranged set of scalar products of a vector z and vectors {a^}: 
{fci, fe2, . . . , k r }, where ki = z ■ a^, define a vector k: 

k = zA; z s = kA T = zAA T , Z S = KA T = ZAA T (5) 

We can introduce the qxr matrix K, K = [ki;...k 9 ]. Assuming the 
Eq. Q we can write: 

K = ZA A T (Z T WZ) A = K T WK = (A T A) A = A. (6) 

The equation <(6j shows us that eigenvalues correspond to sum of weighted 
variance of the projections of all vectors z< and gives us the reason why we 
should confine ourselves only to such components for which their eigen- 
values are sufficiently large - others do not content any true information, 
they represent only a noise and so could be trimmed. 

The application of PCA and WPCA should help us to find the number 
of parameters essential for the description of variability (number of mech- 
anisms of variability in action) , it enables us to examine relative quality of 
observations in multicolour measurements. Though we do not know Sj of 
individual colours exactly, we could improve them very quickly using an 
iterative circle. The convergence of this process is pretty good, because 
the results are as a rule insensitive to the Sj used. 

Above mentioned methods help us namely in the preliminary pro- 
cessing of observational data, when we want to reach an orientation in 
the nature of variability of studied objects, possible relationships among 
measured quantities and their quality. All these information we can gain 
without using any physical model and time dependent smoothing, what 
can strongly embarrass finding a priori unexpected types of variability 
(rapid variations, trends etc.). 

We demonstrate the PCA treatment of artificial photometric data (50 
observations in 5 colours) simulating light variability of a rotating CP star 
with two differently coloured photometric spots. The "observed" points 
and smoothed points with suppressed noise for r = 2 for individual colours 
are displayed in the phase diagram in the Fig. 1. The treatment have not 
taken into its consideration the phase information. 

PCA methods similarly as LSM suffer from outliers which are quite 
common in astrophysical data. Introducing of weights in PCA enables 
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Figure 1: Smoothing of "observational" data (full dots) by the standard PC A 
method. Smoothed points are denoted by circles, dashed line represents the 
original unnoised light curves in five synthetic colours. 



us to eliminate their influence by means of an iterative process adjusting 
individual weights of entering data (see e.g. the appendix of [7]). 

3 Advanced PC A 

The extent of applicability of the standard and weighted PCA methods 
is rather limited as they are demanding to the completeness and homo- 
geneity of input data. These confinements were obviously one of decisive 
reasons why the PCA techniques remain beyond the scope of the majority 
of observing astronomers. 

Since 2000 we have developed a qualitatively new method synthesizing 
weighted PCA and robust regression. We will denote it as Advanced PCA 
(APCA). The versatility of APCA proves to be quite broad, it was used 
several times, see e.g. [I], [5], [B], however, it has not been fully described 
up to now. We will briefly present only the method, without its derivation 
and strict mathematical proving of lemmas or statements. 

3.1 Vector description of light curves 

Let the course of a light curve is described by means of preselected model 
the parameters of which are determined by standard regression meth- 
ods, as LSM with weights or its modification eliminating the influence 
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of outliers. It is advisable to use a linear model so that the course 
of a light curve in the certain colour c, m c (t) would be described by 
linear combination of the ensemble of q so called elementary functions 
fi(t), f2(t), ■ ■ ■ fq(t) defining the time dependence by the column vector of 
the length q: i(t) = \fi(t); h(t);... f q (t)] by the relation: 

9 

Am c (t) =m c {t)-rrr c = Y^ V" = ^ f W> ( 7 ) 
i=i 

where y ci are components of a row vector y c , rrQ is the mean magnitude 
in the colour c. The components of the vector are found from the obser- 
vational data by standard regression procedures (weighted LSM, robust 
regression) . 

We should be very particular about the choice of the base of elementary 
functions fi(t), /2(f), ... f q (t). The functions should be selected so they 
enable us to express courses of all studied light curves of the object with 
sufficient accuracy. It is advisable for many reasons (avoiding of problems 
with multicollinearity, equality of the uncertainty of components of the 
vector y c ) to opt elementary functions so that they would form the base 
quasiorthonormal on the set of data, what means: 



ft = Pi=P\ / l /,«/ 2 tori* j- (8) 
In the case the set of elementary functions does not obey above given 
conditions it is trivial to transform the system into orthonormal one by 
means of standard Gram-Schmidt's orthonormalization procedure. 

Assuming observational data be distributed more or less uniformly 
over the observational interval it is recommended to use Legendre poly- 
nomials orthonormal on the interval ( — 1; 1). If the object is periodically 
variable then the condition of quasiorthogonality fulfill any combination of 
harmonic polynomials sin(2fc-7rf/P), cos(2knt / P) , k = 1, 2, . . .; f 2 — 1/2. 

If the functions of the linear regression model are quasiorthonormal it 
is valid that uncertainties of particular components e c of the vector y c are 
the same: 

£c = . — = =; w c ~e c ~ — , (9) 

where s c is the standard deviation of the light curve fit, iV c is the number 
of observations in the particular colour used for the light curve fit. The 
weight of the corresponding vector of light curve in the c-colour then will 
be proportional to the e~ 2 . 

The whole set of vectors describing the light curves in all p colours can 
be arranged into the pxq matrix Y, with the weights described by the 
pxp diagonal matrix W. 



3.2 Advanced PCA. Reducing free parameters. 
UsageofAPCA 

Let we permit that the variable part of light curves in all colours Am c 
can be sufficiently accurately approximated by linear combination only 
r, (r < q) normalized orthogonal (principal) functions <fij(t) determined 
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by linear combination of all q elementary function fi(t) with coefficients 
forming the qxr matrix B: 



¥>i(t) = X/y/*(*) = b ; f (*)- ( 10 ) 

i=l 

r r 

Am c {t) = <Pj(t) = kcjhj f W = kc BT f W = Ybc f W' 

3=1 3=1 

ysc =k c B T , (12) 

where b, is the normalized vector of the j-th principal function and j-th 
column of the matrix B. The row vector k c (lxr) represents semiampli- 
tude components of the light curve in colour c versus r principal functions 
{(,91 (t ) , . . . <p r (t) } and the 1 x q vector contains parameters of the APC A 
smoothed light curve in the c colour. 

Further we will assume that the vector base {bi, . . . b r } is orthonor- 
mal, then: 

b I) -y . B T B = I. (13) 

Minimizing the scalar quantity S^B,^) defined as the the sum of 
weighted variances of differences Ay c = y c — y^: 

p v 

s(b, kj = J2 A y- w - = J2(y° - k <= k ?) w *> s™ dS = °> 

C=l C=l 

(14) 

we arrive after some algebra at the following conclusions: 

K = YB, (Y T W Y) B = B (K T W K) = B A. (15) 

Y S =YB = Y(B B T ). (16) 

The results in equation (|15[1 and (I16|l are formally identical with (J4j6j) , 
so we can conclude that the matrix B contains r column vectors which 
are eigenvectors of the matrix Y W Y corresponding to its first r largest 
eigenvalues. The smoothing matrix B is defined identically, as A. 

Nevertheless, we have to emphasize that the advanced PCA is not 
identical with standard PCA or WPCA. APCA and PCA give very simi- 
lar but not the same results, smoothing matrix B is not the duplicate of 
A! The main reason consists in the fact, that data treated by PCA have 
been centered to their mean, while in the case of APCA we handle directly 
with the found data without any centering. The difference is formulated 
in the basic suggestion of APCA (|10p , which seems to us physically more 
entitled than postulates of PCA. The correctness of APCA method has 
been verified using several relevant statistical tests and trials with simu- 
lated data. 

We demonstrate the usage of APCA on synthetic photometric data 
simulating light variability of the same model of the rotating CP star. The 
Fig. 2 displays the phase diagram of multicolour light variations: "observed" 
points are marked by full dots, the light curves fitted by standard LSM 
technique are displayed by dashed lines. The light curves found by the 
APCA are plotted by full lines - they are indistinguishable from synthetic 
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Figure 2: Fitting of "multicolour observational" data by advanced PCA method 
(full line). Dashed lines represent the fitting by the LSM. 



ones. The course of first two principal curves are displayed in the Fig. 3a, 
the dependencies of semiamplitudes versus both principal light curves are 
plotted in the Fig. 3b. From the course of diagram of real object we can 
gather on the nature of the light variability. 

APCA can be used for reliable prediction of multicolour behaviour of 
the object, the method is very apt for the light curves quantification and 
classification [4], [6|, for multicolour O-C measurements [5j, for the light 
ephemeris improvement [3]. APCA seems to be a very efficient tool for 
the analysis of spectral variations and radial velocity measurements [3.. 

4 Conclusions 

Principal Component Analysis and namely Advanced PCA proves to be 
an universal, relatively simple method with an extremely versatile extent 
of usage namely in the astronomical data (both photometric and spec- 
troscopic) processing and interpretation. Efficiency and applicability of 
the PCA grows when we combine it with other sophisticated methods of 
the data treatment as e.g. robust regression, weighted LSM and wavelet 
analysis. 

The author is very indebted for to Drs. Miloslav Zejda and Jan Jam'k 
for careful reading of the manuscript and valuable comments and sugges- 
tions. This investigation was supported by the Grant Agency of the Czech 
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Figure 3: (a) The courses of first two principal functions, (b) The dependence 
of semiamplitudes on wavelength in nm for both principal components. Dashed 
lines - 1 st principal component, dotted lines - 2 nd principal component. 
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